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a, prior solutions md their disadvpiiages 

Prior solutions keltide; 1) software reordingof graphics pnirriifees at the application level,. for 
example as necessary in OpenOL tppmGL92] 2) multipass rendering with a dedicated augmented 
frame buffer Pvlarr^^ 3) limiting users to 4 levels with a hardware chip [Kelley94][Winnei^7]; 4) 
Safroare A^b\Mcr : xmkm {Gwf^m0^ S) screen door t^mparency suck as- that implemented in 
the SGI Reality Enguie [Akeley93]; 6). a, dedicated sprteg 

"The dfoadvantages of the software teekiiques (1, 2, 4), are the meiBeiency of having an 
ar^rjicatfem ■s.c^-pnnmves.. the sorting is viewpoint d^ndent, so fqr' 3D graphics rendering 3 as a 
viewpoim is altered a new sorririg of all primitives would be done, or held in a previously created data 
structure such as arrocl^^ reqiiir^ cutting primitives that are mterseeting other 

primitives: to break potential cycles . 

Hie disadvantages of the previous hardware tecliniques, are either quaHty trade-offs, such as only 
supporting a fi^d ..auiier of laprs. (3), reducing tlie spatial solution via -a dithering technique 
called screen door tramparency v&lch also supports only a feed rtumbex of transparent layen (5), or 
tremenddus: cos witJb rnany dedicated memories and circuitry only for the sorting (6). 

There has been no prior sriution that provides efeOTpmical true trans parency -without changing 
the application, in a graphics 3D rendering architecttme. 

j 

b. problems solved by the mveMoi* ! 

The. prrjblern solved by the invention is how to cc^ut^ a 3D grap]^ 
are partiafly transparent, while providing the ease of use of a traditional Z- buffering archi^ture. The 
problem is a long standing one in graphics, and involves the proper sorting of the primitives in the 
depth ordering along view rays, A primary advantage of the invention is how to solve this in. 
hardware, with an economical (few gates) rrudipcL Also, there are no trade-offs in quality, as any 
number of layers at a given: pixel may be supported, and the memory cost is. amortized across the 
entire screen. 



C, Inscription of the coastruction and operation of the invention 

Hie invention is fully disclosed in the arched HP Confix j^ttenbi^^ 
This invention would be aMed as desciibed to a 3D rendering graphics ASIC or ASICs, to 
implement k hardware true transparency. A proposed hardware arcMteexure, as well as specific 
hardware control are given in the referenced technical reppn;. 

d. Advantages of the invention oyer what /h>s heeo done before 

The advantages of the inveMpn.oy^ what has been done before ale: the invention does not 
require modification of the- application; the invention^ does not force the applie^tiOT to sort pri^tiyes 
(triangles) to wt>rk proper^ die invention is economical, as it . requires only simple modifications to 
the Z'comparison and . compositing logic qfesiiiB^ provides: 
new features without compromising existing features; and the invention may also incoiporate' 
antialiasing. \ 
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True Transparency with the Fragment Buffer Graphics Architecture 



The fragment buffer is a new method for providing computation for true transparency of rendered frag- 
ments. True. transparency is provided without altering the application, without requiring the application to 
sort the: data* and without the deficiencies of previous methods such as screen door transparency and the 
A-Mh:ius buffer. The fragment queue or fragment buffer can compute true transparency with any number 
of layers. A variant of the fragment buffer that was designed for minimal hardware complexity, with max- 
imum algorithmic improvement is simulated- Statistics are shown for a variety of different scenes using a 
trace based .methodology, and. m mstrumented Mesa(TM) OpenCL implementation, The fragment buffer 
is shown to require from 2.1 to 3.6 times more memory than traditional Z-buffering to provide true trans- 
parency. Detailed hardware design is provided, including the state transition diagrams, next state table, 
and architectural scliematics,. The fragment buffer can also be used for antialiasing, and an example of 
Carpenter's classical A-buffer amlaliasing m shown. A key'invention of antialiasing is. to modify; Carpenters 
recursive algorithm into an iterative front-to-back processing. 



1 Introduction 

The fragment Buffer is about achieving, greater graphics visual realism through, novel use of resources. This 
paper discusses the' architecture and shows examples of a rendering simulator that uses the fragment buffer. 
Sutherland, Sproull and Schtrmacker (.11) explained hidden surface algorithms as sorting to determine what 
is visible on the screen. Object space primitives are sorted to screen space locations in X } Y\ and Z. Most 
areintectures compare the Z location with the existing values: m a. Z buffer, and decide what can be thrown 
out,- or overwritten into the Z buffer. The technique is feimple, arid fast in hardware. It has dominated 
graphics architectures for nearly two decades. But ¥ there^are deficiencies. It is difficult to use Z buffering 
with complex shading /or texturing algorithms. The reason is because a large number of pixel values are 
ov^mritten r m expei^ive shading* can be overly burdensome, Another primary di|fieulty is that Z buffering, 
is a read modify write, and so m actual sort is not being done, Therefore* true transparency is not possible 
efficiently on a Z buffering architecture. An additional difficulty of Z buffering is that antialiasing is expensive 
and requires expanding the Z buffer to the number of subsamples used for antialiasing. 

Compute intensive shading can be done with a sort la&k approach [7]> If the sorting of an object primitive 
to the screen,, is left until the last step of the graphics pipeline ,. it is called a sort last technique from Molnar 
et al.'s proposed parallel rendering taxonomy. PixelFlow [8] used sort last so that all. pixels were determined 
to be visible or not, and then shaded, which makes shading and/or texturing work proportional to the frame 
buffer stee, not the object complexity. As models become large, this is an important advantage. Pixel Flow's 
primary difficulty is the bandwidth and subdivision necessary to composite entire screens between different 
graphics pipelines. A sort last approach solves the inefficiencies of the Z buffering architecture, butdgg&jEiet 
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provide cbrrecmansparency. v . 

Improved methods for correct transpareticy have beea investigated by Mammen [6] r Garpmter |3j, KeiJey 
et aL [5, 12]* arid BaM* et al. [2]. The proposed techniques are either software only [3], f^re tnultipte 
passes ordering render a feed ntmiber^rtr^^ 

[5, 12]. Transparency is challenging pmblem to solve in jbaxdw&re, and otter techniques such as screen 
door, or sorting the polygons baek-to-tot in the m^4kmhm^ been used. There axe quality problems 
witii sereeii; door traimparency [1] 3 essmtially a dithering technique, and r^uiring apphe&taoxis to send the 
pplygohs in sorted order is; not general to legacy applications or easy to; do. 

Improved methcKis &t been investigated throughout the graphicn literature, and numer- 

ous techmques exist. In hardware, antialiasing has been done most directly tough supersampling fl, 8). 
Adaptaticms of the A-bufe [3] to hardware hav^j also been inves^gated, with parti^viinplemm 
to the g^eraiity of the A-buHer approach [12]. The difficulty to economically implemenl; antMi^ing, has 
meant th^ most graphics ar^iitectures support antialiasing witli multiple passes thrdiagh the geometry. 

Figure I on the bottom row shows what happens in OpenCiL, when rendering 3 transparent squares of 
red. green, and blue. A difRrcnt image results from eadi different drawing order, even .though the *3 square 
hav^ a fix^l Z depth location. Oh the top rdwof Figiire 1, different drawing order does not impact the visual 
appearance. Tins shows the results of truq ^ the invention of the fragment buffer. 




Figure 1: Top row, fra^ueut buffe:, same appiwaxioe. Fronintw to far, the squares are ordered Blue, Green, 
Reel Bottom row ? OpenGL, diflfeirent every time. j 

By the addition of a memory, called the fmgment buffet, proper transparency ordering and antialiasing 
can be econormcally implemented in hardware. I show how T for example, correct transparency can- be 
implemented in such an architecture. X also show how the A-bufer ^prithm and aclaptive antialiasing, 
or npniicdlbtm sapling can be imptemffited. l^entiall^ a frame buffer is used for stotiiig the closet 
opaque fragment^ or the furth^t txamp^ent&agment if ti|ere aren't anyopaque fragments. For pixels ■with, 
additional fragmmts, "^thc^ir^manW are sent to the buffei ddng with their X and Y location, ha successive 
passes, the fragments are considered, and composited [9] (Porter and Duff) into the frame buffer. Only 1 
pass is nailed for processing the geometry, so no .extra storage is needed for geometry, and a single fragment 
buffer is shared for the entire screen. This amortization of the extra storage over the entire screen allows 
unique savings over techniques with large per pixel dedicated storage. For ar<MtC€tpres that do screen based, 
subdivision, such a fragment buffer fits in naturally. Bucketization of primitives and/or fragments reiluc^s 
the storage requirements of the X and Y location, as it is addressed only within a tile. 

True transparency and adaptive antialiasing have only been attempted in hardware in high end graphics 
image generators for flight Simulation [2]. The use of a* fragment buffer is flexible and efficient for the 
calctdation of proper transpanmcy, without multiple passes of the geomeixy. Multiple passes of fra^ients 
are efficiently done, and many fragments are culled, and eliminated with Z and occlusion te * 
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space subdivision, which allows for the reduced communication requirements, also provides for reducing tin* 
amount of sorting necessary. And, antialiasing can be done efficiently without dedicating a large amount of 
momorv per pixel The buffer area may be partitioned to provide efficient separation of, different types of 
fragments, and the novel ease of nonuniform .sampling in a hardware pipeline. Experiments show the required 
memory to support true transparency with hSgWy detailed models to be from. 2,1 to 3.6 times more memory 
than additional Z buffer, Additionally a reformulation of Carpenter % software recursive antialiasing 
technic shows how to perform iterative front-to4mck antialiasing with this proposed hardware. Many 
issues need to be investigated, such as the inipkmemation of stencils, and the support of all OpenGL modes. 
But, the advantages, and the proposed performance levels; make the architecture an advance beyond what 
has been possible. 

The main inventions describe how to achieve correct ordering for transparency and how to achieve an- 
tialiasing economically. Section 2 describes in the fragment buffer architecture in the context of a Z-buffering 
graphics rendering architecture. Section 3 shows the state transition diagrams, and next state tables, for 
the comparison logic. Section 4 shows the results with several test data sets. Section 5 discusses fragment 
buffer variants, and Section 6 concludes the paper. 



2 Ragmeiit Buffer Gr&p&fes Hardware 

To mptom. the most economical, hardware, an invention that' augments a conventional ^ is explored. 
Many variations of the invention, are possible, and are diseased further in .Section 5. The scenario is as 
Allows, the architecture is a standard graphics pipeline, with geometry processing and rasterization 
(R), We add a fragment bufe, and a 2nd Z storage to the frame buffer. This cohiiguration provides, the 
maximum advantage with the minimal additional memory. Figure 2 shows the architecture. Figure 3 shows 
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Pigme 2: Graphics architecture schematic plus added fragment buffer, and seeded Z buffer. 

more details of the new processing, A fragment coming from rasterization or from the fragment buffer is 
multiplexed into the fragment buffer comparison and controller. The fragment buffer is considered to be a 
circular queue, and if the queue overfiows because of excessive fragments then it can be paged to systems 
memory. Because accesses are sequential and used on a first-in first-out (FIFO) basis, performance will 
degrade gracefully. A fragment is defined as a point sample with color, opacity, and depth resulting from the 
rasterization (R), as in RGBAZ. Depending on the fragment's opacity, depth, and the previous state of the 
frame buffer at that location, a fragment may be stored on the fragment buffer, composited into the frame 
huffoxv or discarded. The fragment buffer comparison and controller is where the z~ buffering comparison 
typically takes place. The z-buffering is now augmented arid revised to provi 
antialiasing. The processing is demonstrated through an example. 

3 
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Figure 3; Retailed di^ani jaf mterfece to fragment; buffer and frame buffer. 



Figure 4 shows a pixel with 8 fraprten^ ? .Qi)ie" of which is opaque^ 0 . the transparent fragments are labelled 
3ft to T4 and Tx,. Ty, and Tx, The fragments are drawn in the order shown 1 ? % 3, 8 f and processing 
occurs: as .shown in the figure, processing octnra by first considering the fragmeiits 'during r^terization. 
This is phaae L Then, if there have been fragments placed into the fragment buffer, fcUowing passes are 
performed. Only a single pixel per screen location is saved, so when multiple fragments that are iinodeluded 
lie upon a single pixeL they are senfe : to the fragment buffer. This' example has 8 passes: phase. 1, phase 2, 
phase 31, ph« 32, phase- -33, land phase 34. For a shorthand notation, the frame buffer is labelled B t the 
next Z value, or 2nd Z value is B nZi next Z primeis B' uz , and the fragment; under coiMderation is F. 

Figure 4 shows how during phase 1, any opaque layers are .found. In this case transparent, fragments* 
closer than the opaque layer were all queiied, Tl, T2,T3j and T4, An imderline indicates the fragment in 
the fragment buffer. The transparent layers beyond the opaque layer, Tx and Ty T were queued. In phase 
2, these fragments, Tx and Ty. are culled, and the frirthfet transparent layer's is saved. The fragments 
are processed - in- tbe sameorder- each time as they are. read from and written to the fragment buffer. Note 
that the true depth complexity of the pixel is 5, but we took 6 passes. In cases where no further than 
opaque- transparent-fragments are" queued, there is one less pass, 5. Phase 2 culls fragments Ty, and Tx 
as they are further frdm the eye point than the opaque layer Q. -Next, in phase 2, fragment Tl, has its Z 
value saved as B nz , and is put on the fragment buffer, as shown by S. Fragments T2, T3, and T4 are also 
re-queued. In Phase 31, the frame <bufer, B, holds the opaque Z 5 dz, and color attributes. or NextZ, is 
also in: the frame buffer, -and holds tire proper Z value for. the furthest transparent layer, TL In Phase 31, the 
fragments come out of the fragment buffer in the same order that they were placed there. First, fragment 
Tl is read, and it's Z value matches BnZ. Therefore, it is immediately composited with the frame bxiffer. 
the 'next fragment is read,'T2, and is- the -furthest Z. th# is closer than the B :n z , so the jR^ = T. 2 (Bnext 
Zprime) is written, and the fragment is requeued (S). Fragments T3 and T l are considered and re-queued 

(S). j 

Phase 32 continues the same as phase 31, with the remainixig fragments on the fragment buffer, There 
are T2, T3 r and T4. Note that the or Nextxprtme .of phase31 is B ns , or Buextz of phase 32. This 
alternation of the interpretation of B nz and B' nz continues for each even and odd phase 3 needed. The z 
storage used is the frame buffer 2, and the 2nd frame buffer Z. Always just 2 Z values per pixel. 

Phase 33 starts with a buffer, B } that is the previously composited opaque, and furthest two transparent 
fragments. B n * m the z value value of fragment T3. T3 is the first fragment considered, so it is matched 
with B 1t z and also composited. Fragment, T4 5 s Z is set to B' nz . Then in phase 34, fragment f 4 is considered 
again, matches B nZi and is composited into the final correct pixel color. Once the fragment buffer is emptied 
processing completes for that frame. The fragment buffer must contain the location of the fragments as 
fragments for the whole screen are intermixed on the fragment buffer. Figure 5 shows possible fra^pientS 
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Figure 4: Fragment processing example, for Z-biifer 3 . with Extra Z ? and fragment buffer. 
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Table I: ;Use of 2nd or extras buffer, the phase next z prime .(Nbi*) and Bhextz alternate in^rp-etation, 
while their location is in B n ~, or Bz of the frame buffer physical location. See on the lefL, 
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on the fragment buffer, with location (A'*, Y 0 ) intermixed with (X pt Y p ). The state marlines that control 
processing for the fragment buffer are described in the next section. 

Xp, ¥p> 22, A2 

Xp, Yp, 24, A4 

Xo, To, X , X <~ fragment from & different location 

Xp, Yp, 23, .A3 

Xp, Yp, ZS, AS 



Figure 5; Fragments on the fragment buffer. 



3 State Machine Specification for Single Frame Buffer Solution 

Figures 6. 7, and 8 show the state transition diagrams of frame buffer pixels during processing. Figure 9 
shows the fragment buffer comparison and controller state beginning with initialization into phase 1 after a 
new frame. The number of phases varies depending on the frame's depth complexity* as illustrated in the 
example in the previous section* Phase I is -always first, and all fragments, are considered during rasterization. 
After ail fragments from rasterisation have been processed;, any fragments that were placed on the fragment 
buffer are considered in phase 2 ? and so on- The processing terminates when the fragment buffer is emptied. 
Fbr phase 3, processing alternates between odd and even phases as shown in Figure 9. 

Bach frarrie buffer location ban a unique state, So each pixel is a. state machine, only, you just need to 
consider the state machine for the fragment location* The state machine requires^ states for phasel, 3 states 
for phase 2, and 3 states for phase 3, The 6 states have been labelled as shown in Table 2. 



BGTHJN VALID 


initial state, no fragments seen here 


VALSD.QPAQOE 


1 opaque fragment seen and stored 


VALID JIRANS 


1 transparent fragment seen and stored 


OPAQUE JNV 


an' opaque fragment stored closer than queued fragments 


BOTH-¥ALro»TLO 


at least two fragments, opaque and transparent 


BOTH-VALID JD.T 


at least two fr&gmentsy both transparent 



Table 2: State assignment definitionsrfor the 6 states in phase L 

The same state assignments are reused for all 3 phases. Of course interpretation varies between phases. 
The 2nd and 3rd phases will generate an en or, if a fragment is located where zero or 1 were seen in phase 
1. Phase 1 states have been separated into 3 columns. In the left column, no fragment has been seen, and 
therefore B and B nz are both invalid. In the middle column, B is filled- In the right column B and B nz are 
filled and at least 1 fragment is on the fragment buffer. 

For this implementation, the rules for equal valued depths are that the earlier fragment is in back of 
the following fragments. For a transparent fragment to be seen, it must he less than the opaque Z. After 
the fragment buffer is emptied, the rendering and compositing are complete. If there are not more than 
1 visible/opaque or transparent fragment per pixel, then processing completes i 
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Figure 7: Phase 2 '--frame buffer pixel state transition diagram. 
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Figure 8: Phase 3 frame buffer pixel state transition diagram. For phase 3 Bnz and Bnz J alternate in storage^ 
location between even and odd phases. 
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outputs/siHe-effects 


next state 


BOTHJNVALID 


Fo 


B = F 


VALID.OPAQUE 


BOTHJNVAIiD 


Ft 


B = F 


VALID-TRANS 


VAOD OPAQUE 


F- >= B. 


none (euli fragment) 


VALID.OPAQUE 


VALID-OPAQUE 


F 0 ,F Z <B Z 


B = F 


VALID-OPAQUE 


VALID-OPAQUE 


F t , F t < B z 


B»z =F Z , queue(F) 


BOTH-VALID-T-O 


VAtrD-TRAN'S 


F 0 ,F z <-=B t 


B ~:F, (cull B replace frame) 


VALID-OPAQUE 


VALID-TRANS 


F„ F z > B z 


queuefB)..^ ■~-B z ,,B, [=■■& 


B pTH-VALlD .T-0 


VALID-TRANS 


F h F Z: <=B Z 


B„- = Fiy queue(F) 


BOTH-VALID-T.T 


VALID-TRANS 


Ft, F z > B z 


queue(B), B'nt — BjjB == F 


BO.TH-VAlMT-T 


B QTH . VA LID _T .0 


(F^ 1 -F S >=B K ~ 


none (cull fragment) 


BOTH-VALID/LQ 


BOTH-VALID.T.O 


F a , F £ < B-&F ? > B nz 


1b = f 


130TH.VALID.T-0 


BOTH-VALID -T.0 


F„,-Ff< Bz&Fz <=B nz ' 


B = F,B m = 0(invalidate Bnz) 


OPAQUEJNV 


BOTH-VALID-T-O 


F h F z < B i kFt>.B nz 


B ni = F s ,queije(F) 


BOTH,VALID_T_0 


B0TH-VALID-T-O 


F tt F z .<B: z kF s ,<~B nz 


qvieue(F) 


BOTH.VALID-T-0 


.BOTH-VALID-TIT 


F ti) F t > B £ 


B n . = B^queue(Bj,B = F 


BOTH.VALID,T.O 


BOTH-VALID.T-T 


F 0 , F z <= > B ns 


B = Fireplace frame) 


BOTH-VALID jT-0 


BOTH-VALID T.T 


Fo, F*. <=■ B t &F z <= B n . 


B = F,B nz = 0(replaee frame) 


OPAQUEJNV 


BOTH-VALID -T_T 


F t , F z > B z 


B„.-: = B ? ,queue(B), B = F 


BOTH-VAEID,T-T 


BOTH-VALID _T.T 


Ft, F z <~B s kF z > B nz 


B ni = F Jt queue(F) 


BOTH.VALID.T.T 


BOTH. VALID .T.T 


Ft, F*<=B :Z kF z . <=B nz 


queue(F) 


BOTH-VAEID-T-T 


OPAQUEJNV 


F. >= B. 


none (cull fragment) 


OPAQUEJNV 


OPAQUEJNV 


F 0 ,F Z < B z 


B = F 


OPAQUE-IW 


OPAQUEJNV 


Ft,F s < B z 


queue(F) 


OPAQUEJNV 



Table 3: State transition table for phasel. 4ueue(X) mean?' to place fragment X on fragment buffer or 
queue. B~frame buffer, F-fVagraent being considered, F#- fragment opaque, F^ fragment transparent, (F z ) 
opaque/transparent don't care, F z ~ fragment depth value, B s . t frame buffer depth value B nz ~ frame buffer 
next zl depth value, 
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process all fragments stacked in previous phase 



Figure 9: Overall state machine for the Fragment buffer compaMson and, controller of Figure 3 



current state 


inputs 


qu tpu ts/si dc^effects 


next state 


BOTH.VALID.T.O 


: F : =s= B na 


B a= cpmj)osiU>.(B V .F) 


BOTH.VALID-T-O 


B0TH.VALID.T.0 


F z \ = B nz ,B' nz >= Baz 


B' nM - F^qumti(F) 


BOTH.VALID.T.O 


BOTH-VALID.T.O 


F z \ = B nz> B' n2 < B nz ,F z <= B' ns 


•qufM«(F) 


fiOTH_VALlD,T.O 


BO.mVALID_T.O 


FJ =. B m rBi ts < Bn K ,F z > B' nz 


B^ z =F Z3 c S aem(F) l 


BOTH.VALID.T.O 


OPAQOEJNV 


F. >= B z 


none (cull fragment) 


OPAQUEJNV 


OPAQUEJNV 


F z < B z SzF Z: > B nz 


B tlz = F z iqueue(F) 


OPAQUEJNV 


OPAQUEJNV 


F. <.BJgF z <= B nz 




OPAQUEJNV 


(OPAQUEJNV ptovides same behavior as BOTH-VALID .T.O of phase 1) 



Table 4: State transition table for phaseB. queued) means to place fragment X on fragment buffer Or 
queue. B-frame buffer, F-fragment being considered, F 0 - fragment opaquej F r fragment transparent* (F x ) 
opaque/transparent don't care, F z ~ fragment depth value* B z ., frame buffer depth value B^- frame buffer 
next z depth value, compo$ite{B i F) Porter and Duff over operator or QpenGL composite, 



current st&te 


inputs 


outputs/si de-^ects 


next state 


B0TB„VALIDJTJ> 


Sam^ as phase 2 above 


Qf^QUEJNV 




B — mnipmihiB^F)^ = B n ~ 
(or B' m ~ l B nzi B n , =: F z 
already) 


BOTH j^\LID J G 


OPAQUEJNV 


FJ ~ B nx 


B. = F 3s queue(F) (or 8< nz = F z ) 


BOTfLVALID.T.O 



Table 5: State transition table for phase 3. OPAQUEJNV only occurs, in phase 31, not in phase 32 } phase 
33 s etc. Note: that for phase 3, the B nz and B' uz are in different physical locations depending on even or 
oddness of the phaae. Phase2 looks like an even phase 3. 
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fragments in a pixel that are not occluded processing may complete in phase 2, and so on. Tables 3, 4, arid 5 
provide the state transition diagram, next state transitions as well as the side effects and outputs. In phase 
3 the B nz andJ9^ alternate their physical location as shown below in Table 6. The outputs and side effects 
are given in sequential order, for example in Table 3, 7fch row, ^ALlp^TEANS, F<», Ei > B st » the frame 
buffer fragment is written to the fragment buffer, queued), its Z value saved as next z, B nz ~ B s , and then 
overwritten by the fragment under consideration, B =-E. 



physical location 


phase 1 


phase 2 


phased odd 


phase 3 even 


phase 3 odd 


Bnz 




B n , 


B' nt 


Bnz 


BL 


B z 




B' 


Bnz 


B' nz 


B„ z 



Table 6; Phase 3, physical location changing meaning from Bnz to Bnz' in odd and even. 

This state machine has been fully implemented and debugged, and several models have been run through 

it as discussed in the next section. 



4 Results 

The fragment buffer implementatioh with a Z buffer and an extra Z buffer as described in the previous 
section has b£en implemented; Statistics have been gathered on several models to provide an indication of 
the performance implications. Table 7 provides statistics for processing Of 4 scenes. All images were 512x512 
pixels. The scenes as rendered are given in Figures 11 to 14- Artifacts from OpenGL (middle images) include 
the tire in the back seat of the chevy in Figure 13, and the nose gear on top of the helicopter in Figure 14, 



data 


no, of phase 3 


total passes 


Z bandwidth 


frag bandwidth 


avg depth 


scene 


6 


8 


2,765 : 189 


7,497,201 


2.35 


spheres 


10 


12 


8,178,293 


36,889,274 


3.93 


chevy 


8 


10 


3,404,954 


8,396,718 


2 17 


heli 


9 


11 


2J9S/204 


. 8,155^98 


2.68 



Table 7: Fragment buffer processing statistics, bandwidth in bytes, all 512x512 frames. 

The necessary bandwidth to the memory holding the frame buffer, extra z buffer, arid fragment buffer 
was computed, during the execution of the simulation. This considers the memory model shown in Figure 2, 
where the frame buffer, extra Z buffer , and fragment buffer are aU in the same memory. Table 7 provides the 
memory traffic for the conventional Z buffer (2 bandwidth), and the fragment buffer (frag bandwidth). The 
depth complexity is also provided in the table, and is an average over pixels covered* for the case where all 
geometry is considered transparent. Figure 10 plots the conventional Z buffering bandwidth to the fragment 
buffer bandwidth. For these scenes, it can be seen that the number of passes may be high, on the order of 8 
to 12 passes. For complex interpenetrating transparent objects the depth complexity can be arbitrarily high 
at a given pixel. For example, in unstructured volume rendering, the depth complexity will be much higher, 
as there are thousands of layers for some pixel locations. But, because the application sorting of the data 
prior to rendering is such a burden, even the numerous passes of the fragment buffer will achieve superior 
results. 

The key thing about the bandwidth numbers, is that they are on the same order as the Z-buffering, 
and there will also not be any texturing at the time that the fragments are 



10 



g sprtea. mereiare, ine 





HP. Confidential 



pe*********** Modest, because the first pass -fte ^^^^^^^ 
oasses wUI not be competing with texture mapping oper^ons. The ratio of Z buffenng txaftc t? toUU 
SSmbSt traffic 5ies from 2.5 to 4.5 in these examples, These examples are alsasevere, m that ail. 
^SSr^par^, so a new cap»y wffl place diffort str^ 

St^^Z buffering, some difFer^ce in processing is -^J*^*^. 4 ^ 

S is nested to v£y with higher resolutions, hut the number ^fragments will mcrease m bothZ 

baflferihi ^fragment buffer processing. 




■sphtttm 



chew 



Figure 10:: Bandwidth for the four scenes shown in Table 7, Conventional Zbuflmng traffic in byjesf is 
compared to raidering all surfaces as partiaUy trateparent wiUi the fragment buffer. Those with the highest 
scene complexity and depth complexity require more bandwidth. 




Figure 11: A scene of a cone tam and sphere. The. left uhage is Z buffering, the middle- image is OpenGL, 
and the right image is the fragment buffer. 

Figure 15 shows an example of the multiple passes that are taken. In the phase 1 pass, in the upper 
left, the rearmost fragments are determined^ and placed into the frame buffer. On the next pass, in phase2, 
the next furthest transparent layers are composited into the frame buffer. In these renderings, the front and 
back faces of triangles are shaded, as the front and rear of the sphere are visible with all surfaces slightly 

transparent. . ... , 

The implementation was also rigorously validated with permutations of cases of 3 transparent levels, as 
shown in the introduction, and with 3. transparjent levels and an opaque layer that: was placed in all fr^bte 
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Figure 12: Aamsl 7 intersecting spheres, The- left image is Z bufiering, the middle image is OpenGL, 
and the right image is the fragment buffer. This scene is meant to compare to Mammetfs Figure 4, where 
he also renders 7 intersecting .spheres. 




Figure 13: A scene of a Caligari True Space mpdel of a 1Q57 Ohevy.The left image is Z buffering, tiSei^iftfe 
ihiagc is OpenGL, and the right image is tile fragment buffer. 




Figure 14: A scene of a Caligari True Space model of W apache Helicopter. The left image is Z buffering, 
the middle image is OpenGL, and the right image is the fragirient buffer. 
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locations, in front of all three, at the same depth as all three, behind all three, or in between the 1st and 
2nd or 2nd and 3rd. The transparent layers could be drawn in 8 orders. The opaque layer was placed at 7 
different locations, and the opaque layer could be drawn in 4 different orders in relation to the transparent 
layers. Figure 16 shows 3 examples from the 168 (6*7*4) combinations that were verified. 

From the simulation, conclusions can be made regarding the size of the memory needed to support this- 
functionality. The frame buffer is assumed to contain a fragment, which I define simply as R f G, B, A. t 
taking 5 bytes or so. Each component of RGB may be I byte, .-and alpha ^pically needs higher precision for 
compositing, which would be necessary for an architecture supporting true transparency. In the examples 
shown, all alpha processing is back to front, so an alpha buffer isn't actually needed in this instance but 
has been assumed. So, resolution * 5 is the frame buffer storage in bytes. Additionally, a Z buffer is used, 
and this is assumed to take 4 byte. For fragment buffer processing, an additlpnai Z buffer is used for 4 
additional bytes per pixel. It is assumed that the 3 bits needed for state per pixel, can be within this 4 
bytes, or simply added 4 bytes 4- 3 bits. 

Now the actual sisse of the fragment buffer to be used is at least as large as the number of fragments 
to be stored in the fragment buffer cm phase 1. This is somewhat arbitrary, as it depends on the depth 
complexity (DC) of the scene (fully transparent depth complexity), and the percent of the frame buffer 
covered. The added memory is then DC * percmLcomrgd .*■ aMedMmdge. The added storage is straight 
forward to calculate, as it is simply the sum of a fra^nent% bytes (5), Ts bytes 1 (4) and the address of 
the pixel the fea^nent maps to (3) to total 12, = 5 4- 4 * 3. In this case 3 bytes would be able to address 
a 4096x4096 frame buffer. Table 7 detailed the bandwidth on external buses. Table 8 shows the amount of 
memory needed for these datasets. The direct measurements show that from 2.13 to 3.63 times memory is 
needed for a system with the fragment buffer. The resolutions can be scaled up, and the assumption that 
the same percentage of pixels are covered, and that the depth complexity stays the same, meaiis the ratios 
are approximately the same. The only variable affected by resolution is the size of the address necessary to 
store on the fragment buffer itself; For 512x512 frame buffer 18 bits may be used, Fpr slightly more bits 
higher resolutions are achieved, 1024x1024 with 20 bits, 1280x1024 with 21 bits, and; 1600x1200 with 22 bits. 
This increases the ratios of additional memory needed only slightly (2.13 to 2.17 for the helicopter). 



data 


frame buffer 


extra Z 


fragment buffer 


total fragment buffer 


total frag/frame buffer 


scejie 


2.25 MB 


I MB 


1.57 MB 


4.82 MB 


2.14 


spheres. 


2,25 MB 


I MB 


4;92 MB 


8.17 MB 


3.63 


chevy 


2.25 MB 


1 MB 


L85 MB 


5.10 MB 


2.27 


heli 


2.25 MB 


1MB 


1.55 MB 


4.80 MB 


213 



Table 8: Fragment buffer processing statistics, minimum memory usage in MByte, for the 512x512 frame 
butler (constant), extra Z (constant^ fragment buffer, total of fragment buffer, frame buffer, and extra Z, 
and a ratio of the total fragment buffer memory versus the frame buffer for traditional z buffering. 



But, for tile based hardware, there are additional improvements possible because of the reduced address 
to be stored with fragments on the fragment buffer. For example a l600xT200 screen, requires 11 bits for X 
and Y, for a 22 bit overhead per fragment. Tiling by 64x64 tiles, requires 12 bits for X and Y, versus 22 bits, 
a 10 bit wing for fragments. The savings is a fixed percentage, no matter how large you decide to make the 
frapnent buffer, ranging from 10% to 15% for the 512x512 to 2048x2048 resolutions considered with a 64x64 
tile. For most graphics the depth complexity is considerably less, so less memory would be consumed. 
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5 ^agmeirt Buffer Variants 

The fragment buffer implementation as described is targeted for lowest cost with reasonable perfonriance; 
A continuum, of choices trading off memory versus number of passes is possible. In addition the memory 
overhead is reduced for tile based architectures. Antialiasing can also be supported, which provides for 
an adaptive supersampling strategy. Table 9 shows the number of passes needed for various hardware 
complexities. Case 1 is 1 frame buffer, for which 2n passes are needed, wta? n is the worst case transparent 
layer depth complexity. Case 2 is, the case just presented, where a frame buffer is used, and one extra frame 
of The number of passes is halved to n. This case 2 also has the advantage that no explicit tile or region 
compositing is needed, because compositing occurs when the fragment z value matches the next z value 
F z 3==- B nzi Case 4 is the most general, where there are- M , frame buffers available for the entire screen for 
tile). In this case only n/N passes are jaetessary* but with the obvious increase in necessary memory. 



case 


description 


total passes 


1 


1 frame buffer 


0(2n) 


2 


I frame btrffet and 1 Extra Z buffer 


0(n) 


3 


2 frame buffers 


Q(n) 


4 


Si frame buffers 


Q(n/N) 



Table 9: Trailing off memory versus number of passes. Case- 2 is the one presented in the previous section. 

In this section we briefly discuss the options of using fragment buffering with a tile based , deferred shading 
architecture we call DeferredZ, and how to support antialiasing with the fragment buffer. I show an example 
of processing to implement ; Carpenter ? s classic^ A-buiferj a software techmcjue, in hardware. The DeferredZ 
architecture is a pipeline that delays the % comixes and sorting, and also allows deferred fading to be 
performed, Figure 17 shows the overall DeferredZ archiiectoe. From the left, primitives are sent across 
AGP (Accelerated Graphics Port), into the Preeulling and XY bucketteation stage. The output of this stage 
are the Screen space primitive vertex triangle strips' that survive culling. The next stage is the Hierarchical 
Z-culling or culling, where hierarchical Z-cullmg is performed only for a region of the screen. Following this 
is XY rasterization of primitives that are not occluded. M this case rasterization means conversion of the 
triangle data to raster X Y coordinate location fragments, although all shading attributes such as normals 
are forwarded with the fragment. 

The next steels the fragment buffer compare logic as described earlier, with compositing moved later 
to follow shading. The Fragment Stack, is used for temporary storage to consider surviving fragments. 
Fragments after the first pass ate sent along to the Lighting and Coloring and Shading phase of the pipeline. 
Because the primitives are sent in the proper order, they can be directly composited once all lighting 
calculations have been done. After this point primitives are sent directly into the frame buffer, which is 
distributed in the screen bucket fashion. It i& assumed that buckets would be screen square tiles, but this 
architecture could also use scanline parallelism, or pixel interleaved parallelism.. 

There are four memories in the system, labelled A s B, C, and D. There are two new memories in B: the 
Fragment Buffer and additional Z-buifers and attribute storage. There may be none or a small number of 
additional Z-buffers. Figure 21 and 22 show N = 2, and other cases of N = 0 5 and N = 1 are analyzed also. 
The pipelines are truly independent, and no intercommunication is required between them for single pass 
rendering. Thisassumes that the texture memory is replicated so each pipeline has unfettered access to the 
textures that they need. 
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Figure 17: DeferredZ overall architecture 



5*1 Eragmexit buffer for correct transparency 

This variant of the fragment buffer provides deferred shading and true transparency. Figure 21 shows the 
memories including the fragment buffer, depth buffer(s), and attribute buffer (s). Additional depth and 
attribute memories may be used* but the rendering algorithm will be described with two Z and attribute 
buffers j and other cases will be described following that. The line surrounding the memories are those in 
Figure 17 labelled % fragment buffer and attribute buffers. The fragment buffer may be implemented as 
stack. BAM, queue, or FIFO, because the order in which the fragments are considered is sequential. All 
fragments are read in during a phase, and may be requeued. 

Referring to Figure 17, DeferredZ hardware, the XY sort portion, performs the geometric processing 
necessary to calculate the screen space vertex locations. The HiZ block performs hierarchical Z-buffering as 
explored by Ned Greene, [4]. At this point are the succeeding primitives, typically triangles defined by their 
vertices; The triangles enter the rasterisation stage* where fragments are determined. A fragment in the 
Deferred Z architecture is the per sample data with Z-a numerically computed perspective depth, and the 
attributes. A, that are used for lightings texturing, and shading. 

Definition Fragment — <Z 4* Attribute (including material value, normal, texture coordinates). 

The processing in the blocks given in Figure 21 consists of "the Mowing: 

L Rasterisation: take .succeeding : primitive and compute succeeding fragnienis. 

2. Z compare/ Z sort/ A A compare/ AA sort: Take succeeding fragments and process them to determine 
the first set of visible fragments. The number of visible fragments determined depends upon, the number 
of Z and attribute buffers, A. 

3. liight/shado/texture/composite Take visible fragments, and compute the lighting, shading, and tex- 
turing, and composite with themselves and framebuffer. 

The calculation performed in the Z-compare/AA compare, Z-sort/AA sort can be described as occurring 
in multiple phases. The first phase is where all triangles (or primitives, where primitives may include 
polygons, lines, points* etc.) are raster&ed, and sent to the compare and sort block. The compare and sort 
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block ..processes them to compute the Z ordering for JV layers of attributes. If N is 0, and only the frame 
buffer is used the algorithm is different as mentioned earlier. 

Consider two scenarios to simplify the description, first the scenario where there is a single sample per 
pixel, and secondly the scenario where multiple samples per pixel are taken. For the first scenario, the 
correct ordering of transparent layers can he determined. Further consider the case where there are two Z 
and attribute buffers. The processing of primitives and fragments will :take place in multiple phases. Figure 
18 shows the pseudo-code for phase L Figure 19 shows the pseudo-code for phase 2. And Figure 20 shows 
the pseudo-code for phase 3 and higher. Figure 23 shows for a single pixel a hypothetical fragment covering 
and depth ordering. The eye is shown on the left, and the partially transparent fragments, are drawn 
as a line, while the opaque fragments are drawn as a line with cross hatching. Let us consider this as an 
example and describe the processing that occurs during each phase of Z fragment sorting. 

To start in phase I, Figure 18, all primitives are transformed ? occlusion tested, then rasterked. As 
rastemed fragments are created, they aire inserted into the two buffers to determine the closest opaque 
fragment and the furthest unoeeiuded transparent fragment. 

Rasterization: 
fox all primitives- 

rasterisse to fragments 

pass fragEseiits to 2 compare and Z sort. 

Z coispar* and Z sort phase i: 

for all fragments (as received from rasterisation) { 
letch Zmin from memory 
if. z>» Zmin discard fragment 
else if fragment is opaque { 
write &~>£min> 

write fragment info into attributes for shading/lighting etc, 

} 

els : e /* transparent */ { 

fetch 2far transparent , m& A ^attributes froa* mesory 
if 2 > Zfar transparent < 

. write ■z->- Zfar transparent and overwrite attributes, 

write 2farold and attributes to -fragment buffer <X»Y t 2, A> written . 
} /* z > Zfar transparent */ 
else 

write Z, attributes to' fragment buffer 
} /* else transparent */ 
> /* for all fragments */ 



Figure 18: DeferredZ variant, Phas£ 1 Rasterization and fragment Sorting. 

After phase 1 all primitives have been considered we have a processed frame buffer 
Zfartrdfisptircnt, A far transparent, and a fragment buffer for all remaining fragments to be considered 
(iCjFjZ/, A/) as shown in Figure 22, Next is phase 2, to merge the remaining fragments on the frag- 
ment buffer. For phase 2, unload the current Z and A buffers to the lighting shading and compositing stage, 
or send Z near} Anear of closest opaque layer and Zj ar% Af ar of furthest transparent fragment. Note that 
in this form of the invention a valid or changed bit would be needed to know 
unlike the fragment buffer described earlier. 
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Figure 23 shows a case where O u the closest opaque %er, and 2% the furthest transparent layer are 
discarded. The remaining transparent layers are all sent to the fragment buffer. In this example, all layers 
beyond 0\ were either occluded or discarded* before being sent to the fragment buffer. This is not necessarily 
the case, as^ tire order of primitives is arbitrary to start, so fragments that may be occluded by closer, but 
later arriving primitives must be occluded in pass 2. In the fragment buffer of the earlier sections this 
event was noted by .changing state to OPAQUE JNV, which means an opaque layer invalidated previous 
information. For the second and all remaining phases, we will process the remaining transparent fragments 
using the same two Z buffers as before. Figure 24 shows the multiple phases, phase 2 ? phase 3 r and phase 4 
to sort the remaining transparent Layers. So, given. N Z and A or attribute buffers, it will take a number of 
passes where D is the worst case depth complexity, D ~ I transparent layers, and 1 opaque layer. 

The algorithm for the phase 2 is given in Figure 19. The algorithm for the phase 3 and higher is given in 
Figure 20, These: phases are the same in spirit as the state machine provided earlier, with the difference here 
being 2 full attribute and Z birffers^ and a frame buffer following compositing. This aspect of the invention - 
has not yet been implemented. 

set Zfar transparent to -infinity 

sot Xssin to previous Zmin ox closest transparent fragment value 
■for all fragment si (retrieved -from fragment- buffer) f 
fetch Zmifl f rem ■ ©emory 
if Zmin- discard fragment 
else /# transparent t and not; occluded. */ < 

fetch Zfar transparent, and A**af tributes from memory 
if Z > 2f ar transparent C 
if; Zfar- is valid { 

if Zf irst from, far is valid. { 

Efirstfrom far., A go to fragment buffer 

> 

move Zfar, jftfar to 

> 

} /* z > Zfar transparent 
else 

write Z M attributes to fragment buffer 
> /* else transparent */ 
} h for all fragments ♦/ 

> 



Zt irst from far, A f iratf rS&far 
*/ 



.Figure 19: DeferredZ variant, Phase 2 } discarding further layers, and determining next N ^ 1 transparent 
layers. 

This demonstrates the logic of replacement during the fragment buffer processing. The key advantage 
of the fragment buffer for transparency is it computes exactly the correct ordering with fixed Z buffer, 
attribute buffer, and a variable fragment buffer. By only reading out fragments that have been updated, 
and invalidating those while writing them out you can get the valid bits for Zj ar and Zjfi m ffp m j ar reset very 
economically. The technique does not preclude image space subdivision for example; two areas being worked 
on, one with lighting and compositing occurring, and one with Z compare Z sort occurring to effectively 
load balance the work between units. Or, for example, passing along any succeeding fragments with texture 
values to be processed, and re-Z-buffered, depending on the mix of work. Next to be described is antialiasing 
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for all* f rajgaoats (retrieved from f ragmost buffer) { 
if 2 > Zfar .{. 

if Zfar U: valid { 

if Zfirst from far is valid { 

Zfirstfroa fax, A go to f ra&ttent buffer 

} 

move Zf.ar, Afar to .Zfirstxrom. far*, A t irstfronsfax 

> 

overwrite Zfar with Z 

> 

else if Z' > zfirst froiafar < 

if zf irst ires f ar is valid { 

sfixatfroarfar, Af irstfromf ar to fragment buffer 

> 

overwrite Zf irstf romfar with Z 

} 

> /# for all fragments */ 



Figure 20; DeferredZ variant, Phase 3 and higher, determining next JV transparent layers, 
with the same fragment buffer. 

5.2 Fragment Stack for A-buffer/ Carpenter AMialiasmg 

An instructional example to demonstrate the flexibility of the invention is to show how the A~buffer an 
antialiasing aeheme by Loren Carpenter [3], may be Implemented. Some key things to note about the A- 
buffer, is that it has been often emulated, is a software only technique, and performs correct transparency as 
well as antialiasing. Because it is a software only tedmlque, showing how it may be practically implemented 
in hardware is a significant advancement beyond the state-of-iherart. Other hardware architectures have 
emulated some part of Carpenter's A-huffer, such as Stephanie Winner et ah [12], but with trade-offe such 
as handling only a fixed number of fragments per pixel, say 4. Others such as Moinar et al. [8, 10] have 
claimed to be able to perform A-buffer processing, but such claims are not supported, PixelFlow cannot 
sort transparent layers because their compositing network, is only a Z*comparison/Z-bufferiiig network. To 
do proper resorting and compositing is an unsolved problem for a PixelFlow* or sort, last architecture. 

A~buffer defines a fragment as a polygon clipped to a pixel boundary. They have two ; cases, either a pixel 
is fully covered by aii opaque polygon, or a pixel is partially covered by trarisp^rent and/or opaque polygons. 
The data, structure uses a linked list of fragments, sorted from front-to-back by their minimum Z, shown in 
Figure 26. A pixel struct stores both cases of a pixel. Figure 25 shows the pixel struct. 

The mask is a 4x8 bit mask, representing snbsample locations that the polygon covers. Two Z values are 
saved for each fragment j a minimum and maximum Z value, to aid in blending fragments that overlap. The 
result of processing is an array of pixel structs, the sisse of the frame-buffer, and linked lists of fragments of 
triable length for each pixel. 

Now, to implement the same data structures with the fragment buffer requires sending the fragments 
for each pixel, on to the fragment buffer, if they are not the closest N pixels, where we have JV, Z and 
attribute buffers. For earner examples N is 1 or 2, and so I use, N = 2 for this example also. All polygons 
are rendered and converted to fragments,, and the 2 closest to th$ eye fragments are stored in the random 
access attribute buffers > while all other fragments are sent to the fragment buffer. 

19 




IIP. Confidential 



Attribute 
Buffer 1 



22 



Attribute 
Biiffer2 







x 1 

Transparent 
Fragment *~ 

^ J 








Raiterizatitra 


Z compare and 
* ZttttfAA 
compaje/AA 


^ Cclonngiiitt* 



Teat him 
Memory 



Succeeding Succeeding Visible Lighted and 
Primitives Fragments Fragments Composited 

Fragments 

Figure 21; MetBories for true tramp&reney and improved antialiasing in the, B ? fragment buffer and. attribute 
buffers, portion. 
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Figure 22: Buffers and fragment buffer after first phase of rasterization and sorting. 
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Figure 23; Example of succeeding opaque and transparent layers. 
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Figure '24: Remaining phases (phase 2, 3 5 and 4) for determining sort, order Tor transparent layers of Figure 
23 

pi xel struct /fragment 
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float smax : » zrnin 



Figure 25: A short int is 12 bits, and both an area and opacity axe used to more accurately determine pixel 
coverage 
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Array of pixel struct 



Figure 26: A-buffer data structures from Carpenter's software technique. 
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Figure 27: Fragment buffer for antialiasing, Jtf = 2, Z and attribute buffers, the fragment buffer, and the 
hardware modules at this point in the pipeline. 



Figure 27 shows a schematic of the fragment buffer, and the two Z and attribute buffers. In the case 
for the transparency only, we sorted from back^ to-front, now, according to the A-buffer method, we sort 
front-to-back, by fragments minimum - Z.- % so, in essence* we hamtheTii^t>-tw-&4ig^ent3iar each pixel in a 
dedicated random .access, array*, and all others are thrown onto the fragment buffer, 

A-buffer processing requires a recursive coiiaideration of the fragments. Instead, a multipass consideration 
of fragments is performed to compute ;the :saine result. Also, multiple buffer are used when traversing. First, 
how the front-t^back ordering is determined; the Z\ and Z<> buffers store the near Z's } while- the attribute 
buffer stores the far Z, or delta. Z, Z min = Z^in and Z max ^ Z m< n +■ Z mo *&m. This allows fewer bits to be 
used for .Z ma &. All fragments are considered by insertion and possible replacement of Z x and as described 
earlier. Recall the earlier example, here shown sorted after phase 1, in Figure 28. 



Bye 




fifagment buffer 



Figure 28: A-buffer example with same fragments as sorted in Figure 23. 

Figure 28 shpws that now we keep transpareut fragments T$ and % in the Zi and Z 2t and ali ; other 
fragments are sent to the fragmeut buffer. The sorting is done by nearest to implement A-buffer. Fragments 
is-, %, Ti,, Oi are sent to the fragment buffer, as well as fragments beyond 6 U which would have be^n 
discarded in back-to-frout ordering, T 9i 0 Xi T x , T X: We continue to sort as shown below in Figure 29. 

Fragments T3 and T 2 are captured as the next two closest fragments. All other fragments are again sent 
to the fragment buffer. In the next pass through the fragments, the transparent layer li and the opaque layer 
0i are found : and all other fragments are discarded. The lighting arid shadhig/compositihg module blends 
these fragments as they, fully cover the pixel, so they are simply processed frqnt~to~back as shown here. To 
show how processing proceeds when fragments partially cover a pixel, I provide example of coverage. The 
fragments are still all sorted front-to^back, but now, they are not necessarily composited in that order, A 
recursion is created by recirculating the fragments again. 

A fragment's coverage of a pixel detenniiies the inside, subsamples covered by the polygon, and outside, . 
subsampies not covered by a polygon. Figure 30 shows the coverage of four polygons A, B ? C, and D over 
a square pixel area. Note that improved fragment representations, such as those that use D v slopes 
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■Figure 29: A-buffer example continued combining- -from Figure 28, : - continued combining the next two and 
the next two. 



can. also he used, I am using Carpenter V algorithm for clarity, and would recommend implementing a more 
sophisticated algorithm. The polygons would be processed into fragments for this pixel, with A, B, C s D, 
firOnt-to-back after the first phase of processing* The A and B polygon .fragments^ would be on the Z\ % and %2 
buffers. The masks, here shown as 4x4 would be part of the attribute buffer. When the Z\ % A\ (attribute) r 
and %% % -^2 buffers are unloaded, the search mask is computed. The search mask is the binary mask for the 
outside of the current accumulated fragments, M $mT ch. For this example the search mask is first set to the 
area outside of the fragment Awt* M. ua i%h — Amu* then this search mask -is used to compute in and out, 
from front and back of polygon fragment B, The equations directly from Carpenter, page 105, 
M in M $earch n Mf 

And, the: interesting thing that we can dp, is to change a recursive calculation, tp a purely iterative one by 
knowing that compositing can be made fully associative if computed with transparencies. The accumulated 
transparency is calculated in the shade/composite circuitry, and stored in the frame buffer. Essentially take 
the following equation from Carpenter 

G =. eta x Am- + C ou t x (1 - A i n ) as.recursed for- our example G ™ dnA x A^a -f Com x. (.1 - A>a) 

G»uM ~ &in& * A inB + &out£ X (1 - Ain%) 
CwtB = Ci n & X AinC + G 0t > t3 X (1 - A in a) 
G-outS = Gin® X Ai n Q -j~'Q 

Here C is color , C tn is the color from the inside covered area of the fragment, A in is the area coverage of 
the fragment. Th final color for the pixel is G mi ts Here Carpenter's approach is converted to purely a sum 
in terms of transparency: 

G = GinA X A inA -h t:A X CinB X A in fs -f t^B X C in C X AfcC + t^BC * G{ n D X Ai n D 

Four separate contributions from the four separate layers, where transparency of polygon fragment A 
is tA = (1 - AinA)- The equations show how the recursion may be converted to direct evaluation. Figure 
30 shows how the masks for coverage of each fragment are initially calculated, The in the compositing 
equations are those masks, constrained by the search mask. The search mask is updated as the sorting 
processes fragments from front-to-back. As before fragments A, B, C, and D are^orted front-to-hack, and 
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Figure 3Q: Four polygon's coverage masks for a single pixel. 



shown in Figure 31. The search mask is updated front-to-backj and used at each step to compute the A in t 
or as an approximation to the area. The second column M in =- M st:arch H My is used for Aj n at those 
steps. And. each ^l in is used to update the aecuinulatixlg transparency. A in for A ~ M in for A-, 10/16. 
Then fa- = 1 ~~ 10/16. A in for B etp&ls M inB . = 3/16, t# 1 ~~ 3/16, and t^. « (1 10/16) x (1 - 3/16} 
tABC » (13/16} x (M/18) 

Figure 31 shows clearly how the equation using transparencies is built up in time. 1 show below the: 
pipeline procesmng and calcuIatiBg for the final result, Tte eurrent fragment and the search mask are used 
to compute mask in and the next search mask. 

The search mask is computed by Ms = May? = M$n^Mj imrrCfl fj and the mask for inside the fragment 
is computed by ' ffi§nc*rr*ntt^mert =■ Msn Jf/ t8f «^^ fl ( f The mask calculation unit has space for 3 masks, 
the current fragment mask is loaded from one of the attribute buffers, A x or The masks are used to 
compute the opacity or roughly the area by summing the bits with the min mask. 

When the fragment pixel location has been processed, the search mask for that location must be ; sorted to 
memory into the attribute buffer, and reloaded. The same is true for the transparency^ and the accumulated 
color, lb implement A4mffer ? the search mask would haye to be added to random access memory storage, 
as in Zx % A u A 2i md M 5 for the sort and compare memories. This would allow for fragments from 
different pixel locations to be considered in amy order, as the mask state will always be available. 



6 Conclusions 

The economical implementation .of true transparency has l>een shown through the myention of the fragment 
buffer. A simulation was developed for the best fit for a traditional Z buffering architecture. The detailed 
hardware architecture, and control logic were presented. Simulation results were shown, as partial validation 
of the design with several models. The fragment buffer can provide true transparency with additional off 
chip DRAM storage, used in combination with new comparison, control, and compositing logic. Z buffering 
architectures already include most of the comparison logic and compositing. The novel aspects are a state 
machine per pixel, and multiple phases of processing recirculating fragments. 
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Figure 31: Jtf/, M in , and M, eo rcA are computed as the fragments are sorted from front to bade, shown by 
the timeline going from top to bottom. 
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The architecture has the advantage over previous work of 1 J not requiring high per pixel dedicated storage, 
2) supporting any depth complexity as long as the average depth complexity is bounded, 3) not requiring 
on chip storage, 4} not requiring multiple geometry passes* 5) not requiring software sorting of primitives, 
and 6) not being an approximation, like screen door transparency. The benefits are substantial, but much 
further work is needed. Examples were provided for how such a scheme could provide antialiasing, and also 
work in a tile based-deferred shading architecture. There is additional off chip memory requi red, from 13 to 
3.6 times more memory for scenes with 12 layers per pixel and average depth complexities of 2.17 to 3;93 (for 
covered, pixels). Because the fragment access is linear, a graceful, degradation by paging fragments to system 
memory would provide greater capacity. The hardware logic is straightforward and can be incorporated into 
the current Z compare and composite of a traditional architecture. A key invention is the modification of 
a recursive software technique to an iterative frOnt-to-back hardware technique. There are tradeoffs in the 
amount of memory used versus the required number of passes so that the ideal architecture will depend on 
the price point and available semiconductor technology. 
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